Skip to content

Implement native async client#617

Merged
joe-clickhouse merged 46 commits intomainfrom
joe/141-a-database-client-should-be-based-on-asyncio
Mar 26, 2026
Merged

Implement native async client#617
joe-clickhouse merged 46 commits intomainfrom
joe/141-a-database-client-should-be-based-on-asyncio

Conversation

@joe-clickhouse
Copy link
Copy Markdown
Contributor

@joe-clickhouse joe-clickhouse commented Jan 15, 2026

Summary

Replaces the old executor-based AsyncClient, which wrapped the sync HttpClient in a ThreadPoolExecutor, with a native async implementation built on aiohttp. The public API surface is unchanged: clickhouse_connect.get_async_client() returns an AsyncClient with the same methods. The difference is entirely under the hood, where real async I/O replaces thread-pool delegation.

Why this change

The previous AsyncClient ran every operation in a thread pool via loop.run_in_executor(). This:

  • added thread overhead and context switching
  • limited the actual benefits of async I/O
  • complicated resource and session management

The new implementation performs HTTP I/O natively with aiohttp, giving real concurrency benefits for async workloads.

Design

Native async I/O

Requests use aiohttp.ClientSession with a configurable TCPConnector (pool limits, keepalive). HTTP response handling is fully async. aiohttp is an optional dependency installed via pip install clickhouse-connect[async].

Streaming bridge for ClickHouse Native format

Native format parsing and serialization is synchronous CPU-bound work. The client uses a bounded queue in AsyncSyncQueue as a sync/async bridge so async network reads/writes can overlap with sync parsing/serialization in an executor.

On the query path in StreamingResponseSource, the async producer reads from the aiohttp response and the sync consumer parses in an executor. On the insert path in StreamingInsertSource, the sync producer serializes in an executor and the async consumer streams to aiohttp.

Event loop safety

Non-streaming queries methods like .query(), .query_df(), etc. are fully materialized inside the executor before returning. By the time a QueryResult is returned, all data is in memory, so synchronous iteration won't block the loop.

Streaming queries like .query_rows_stream(), .query_df_stream(), etc. detect synchronous iteration from within an async context and raise ProgrammingError immediately, prompting the caller to use async for instead.

Lazy imports

aiohttp is imported lazily so import clickhouse_connect works without it installed. Attempting to use the async client without aiohttp raises a clear ImportError with install instructions. Heavy optional dependencies (numpy, pandas, pyarrow, polars) are also lazily loaded, matching the sync client.

Breaking changes

  • AsyncClient(client=sync_client) no longer works. Use get_async_client() or create_async_client().
  • The executor_threads and executor parameters have been removed from create_async_client().
  • pool_mgr is rejected on the async path with a message pointing to connector_limit / connector_limit_per_host.
  • The internal module clickhouse_connect.driver.aiohttp_client no longer exists. AsyncClient is importable from clickhouse_connect.driver as before.

Migration

# Before (no longer works):
sync_client = clickhouse_connect.get_client()
async_client = AsyncClient(client=sync_client)

# After:
async_client = await clickhouse_connect.get_async_client(host="...", port=8123)

The async client API is otherwise identical. All query, insert, and streaming methods have the same signatures.

Tests

The full integration test suite runs parametrized across both sync and async clients. Dedicated async tests in test_async_features.py cover concurrency, streaming cleanup, session protection, timeouts, and error isolation.

Performance

Benchmarks comparing the old executor-based client against the native async client showed speedups ranging from parity to 75% depending on workload, with an geometric average improvement around 16% across a wide range of realistic workloads. P95 latencies also improved significantly.

Trade-offs

  • ClickHouse Native format parsing and serialization is CPU-bound and still runs in a thread pool executor. The async benefit is in I/O concurrency i.e. overlapping network reads/writes with parsing, not in making the parsing itself async.
  • Non-streaming query results e.g. .query(), .query_df(), etc. are fully materialized in the executor before returning to the caller, which is the expected behavior for those APIs. Streaming variants like .query_rows_stream(), etc. are available for incremental processing.

Checklist

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG

@joe-clickhouse joe-clickhouse linked an issue Jan 15, 2026 that may be closed by this pull request
@joe-clickhouse joe-clickhouse changed the title Joe/141 a database client should be based on asyncio Implement native async client Jan 16, 2026
Copy link
Copy Markdown
Collaborator

@genzgd genzgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems okay to me, although I can't claim to have done anything resembling a full review. A couple observations:

  • I'm curious as to where the improvements come from over the existing implementation, so I'm looking forward to that blog post.
  • There's a lot of duplicated code in the aiohttp_client. It would be nice to consolidate that somewhere.
  • The piece with the async queue is hard to follow -- I don't know out feasible it is, but it would be nice to remove that layer and just use some kind of async based generator without wrapping the extra queue.

@joe-clickhouse
Copy link
Copy Markdown
Contributor Author

Thanks @genzgd. To address your questions:

  • I think the improvements are mainly from true I/O <-> CPU pipelining. In the existing async client we run the sync client in an executor, and it effectively does read -> parse -> read sequentially in a single thread. In the new client, an async producer reads from aiohttp and pushes chunks into AsyncSyncQueue while the parser runs in a separate executor thread. Those stages actually overlap, giving true concurrency.
  • Agreed on the duplication. I avoided refactoring the shared sync client pieces for now to keep the changes fully separate while the async path is still new. Once it stabilizes, I can move common logic into the base client to reduce duplication.
  • I did try for quite a while to use simpler async‑generator patterns, but we need to keep using the synchronous NativeTransform parser. If we do parsing directly on the event loop, we lose the async benefit because the CPU‑heavy parsing blocks the loop. The queue lets the event loop keep reading from the socket while parsing runs off‑loop. Additionally, it provides backpressure/bounded buffering.

@genzgd
Copy link
Copy Markdown
Collaborator

genzgd commented Jan 30, 2026

In the new client, an async producer reads from aiohttp and pushes chunks into AsyncSyncQueue while the parser runs in a separate executor thread. Those stages actually overlap, giving true concurrency.

If we do parsing directly on the event loop, we lose the async benefit because the CPU‑heavy parsing blocks the loop. The queue lets the event loop keep reading from the socket while parsing runs off‑loop. Additionally, it provides backpressure/bounded buffering.

Yes, as I think about it, that makes sense. It might be theoretically possible to run the sync HTTP client (and the buffer) in a separate thread than the parser, gaining a similar benefit. On a related note, making the transform step truly parallel would be challenging given the fact that HTTP chunks won't align with Native format blocks, but that's another argument in favor of a TCP protocol client. :)

@joe-clickhouse
Copy link
Copy Markdown
Contributor Author

For those interested, I have published a RC off this branch for testing and feedback: https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.12.0rc1

@nathan-gage
Copy link
Copy Markdown

For those interested, I have published a RC off this branch for testing and feedback: https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.12.0rc1

this is so great! we will be trying this in our staging environment and report back.

@kbumsik
Copy link
Copy Markdown

kbumsik commented Feb 19, 2026

Hi, I have been testing v0.12rc and I got an interesting improvement with Opentelemetry Context propagation for Async Client. This is somewhat related to #303 :

  • Before: if tracing urllib3, a new root span created for each query() (which is bad) because urllib3 is in a separated thread executor, and the otel context is not automatically propagated to the executor.
  • After: Tracing aiohttp grabs a proper otel context automatically.

@haydn-jones
Copy link
Copy Markdown

Has largely been working well for me. I've noticed some intermittent server disconnect issues, but I suspect I'm the cause of these somehow.

@joe-clickhouse joe-clickhouse added the hold for 1.0.0 hold off merging until we're ready for 1.0.0 label Mar 25, 2026
@joe-clickhouse joe-clickhouse merged commit 2956318 into main Mar 26, 2026
37 checks passed
@thewhaleking
Copy link
Copy Markdown

Is this included in 0.15.0?

@joe-clickhouse
Copy link
Copy Markdown
Contributor Author

Hi @thewhaleking, no, I cut 0.15.0 as kinda like the last release before the official roll to 1.0.0. I'll have a 1.0.0rc1 out sometime in the next few week or so that will include this. I did release 0.12.0rc1 w while back which you can grab from pypi if you wanted to try this out though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold for 1.0.0 hold off merging until we're ready for 1.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A database client should be based on asyncio

6 participants